Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Mol Inform ; 34(1): 44-52, 2015 01.
Artigo em Inglês | MEDLINE | ID: mdl-27490861

RESUMO

Improved understanding of the forces that determine drug specificity to their targets is important for drug design and discovery, as well as for gaining knowledge about molecular recognition. Here, we present a machine learning approach that includes all approved drugs with a known protein target. The drugs were characterized using easily interpretable physico-chemical descriptors. Employing the Random Forest method, we were able to predict whether a drug binds to a soluble or membrane protein with an average accuracy of 84 % and an average area under curve of 0.91. The high average performance suggests that there exist some general physico-chemical differences between drugs that bind to membrane and soluble protein targets. Variable importance measures in combination with permutation tests were used to find the most influential descriptors. This resulted in six outstanding descriptors, that all involve drug flexibility and lipophilicity, suggesting that drugs binding to membrane protein targets are in general more flexible and lipophilic, and conversely, drugs binding to soluble protein targets are more rigid and hydrophilic. With the notion that ligands in general are blueprints of their protein pockets, we may also draw general conclusions about the protein-pocket properties which may add to the understanding of molecular recognition.


Assuntos
Aprendizado de Máquina , Proteínas de Membrana/genética , Análise de Sequência de Proteína/métodos
2.
Curr Top Med Chem ; 11(15): 1978-93, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21470169

RESUMO

Chemogenomics is an emerging interdisciplinary field that lies in the interface of biology, chemistry, and informatics. Most of the currently used drugs are small molecules that interact with proteins. Understanding protein-ligand interaction is therefore central to drug discovery and design. In the subfield of chemogenomics known as proteochemometrics, protein-ligand-interaction models are induced from data matrices that consist of both protein and ligand information along with some experimentally measured variable. The two general aims of this quantitative multi-structure-property-relationship modeling (QMSPR) approach are to exploit sparse/incomplete information sources and to obtain more general models covering larger parts of the protein-ligand space, than traditional approaches that focuses mainly on specific targets or ligands. The data matrices, usually obtained from multiple sparse/incomplete sources, typically contain series of proteins and ligands together with quantitative information about their interactions. A useful model should ideally be easy to interpret and generalize well to new unseen protein-ligand combinations. Resolving this requires sophisticated machine-learning methods for model induction, combined with adequate validation. This review is intended to provide a guide to methods and data sources suitable for this kind of protein-ligand-interaction modeling. An overview of the modeling process is presented including data collection, protein and ligand descriptor computation, data preprocessing, machine-learning-model induction and validation. Concerns and issues specific for each step in this kind of data-driven modeling will be discussed.


Assuntos
Descoberta de Drogas/métodos , Genômica/métodos , Proteínas/química , Inteligência Artificial , Sítios de Ligação , Bases de Dados de Proteínas , Desenho de Fármacos , Ligantes , Modelos Moleculares , Conformação Proteica , Proteínas/metabolismo , Relação Quantitativa Estrutura-Atividade
3.
Mol Inform ; 29(6-7): 499-508, 2010 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-27463328

RESUMO

A proteochemometrics model was induced from all interaction data in the BindingDB database, comprizing in all 7078 protein-ligand complexes with representatives from all major drug target categories. Proteins were represented by alignment-independent sequence descriptors holding information on properties such as hydrophobicity, charge, and secondary structure. Ligands were represented by commonly used QSAR descriptors. The inhibition constant (pKi ) values of protein-ligand complexes were discretized into "high" and "low" interaction activity. Different machine-learning techniques were used to induce models relating protein and ligand properties to the interaction activity. The best was decision trees, which gave an accuracy of 80 % and an area under the ROC curve of 0.81. The tree pointed to the protein and ligand properties, which are relevant for the interaction. As the approach does neither require alignments nor knowledge of protein 3D structures virtually all available protein-ligand interaction data could be utilized, thus opening a way to completely general interaction models that may span entire proteomes.

4.
BMC Bioinformatics ; 10 Suppl 6: S13, 2009 Jun 16.
Artigo em Inglês | MEDLINE | ID: mdl-19534738

RESUMO

BACKGROUND: Chemogenomics is an emerging inter-disciplinary approach to drug discovery that combines traditional ligand-based approaches with biological information on drug targets and lies at the interface of chemistry, biology and informatics. The ultimate goal in chemogenomics is to understand molecular recognition between all possible ligands and all possible drug targets. Protein and ligand space have previously been studied as separate entities, but chemogenomics studies deal with large datasets that cover parts of the joint protein-ligand space. Since drug discovery has traditionally focused on ligand optimization, the chemical space has been studied extensively. The protein space has been studied to some extent, typically for the purpose of classification of proteins into functional and structural classes. Since chemogenomics deals not only with ligands but also with the macromolecules the ligands interact with, it is of interest to find means to explore, compare and visualize protein-ligand subspaces. RESULTS: Two chemogenomics protein-ligand interaction datasets were prepared for this study. The first dataset covers the known structural protein-ligand space, and includes all non-redundant protein-ligand interactions found in the worldwide Protein Data Bank (PDB). The second dataset contains all approved drugs and drug targets stored in the DrugBank database, and represents the approved drug-drug target space. To capture biological and physicochemical features of the chemogenomics datasets, sequence-based descriptors were computed for the proteins, and 0, 1 and 2 dimensional descriptors for the ligands. Principal component analysis (PCA) was used to analyze the multidimensional data and to create global models of protein-ligand space. The nearest neighbour method, computed using the principal components, was used to obtain a measure of overlap between the datasets. CONCLUSION: In this study, we present an approach to visualize protein-ligand spaces from a chemogenomics perspective, where both ligand and protein features are taken into account. The method can be applied to any protein-ligand interaction dataset. Here, the approach is applied to analyze the structural protein-ligand space and the protein-ligand space of all approved drugs and their targets. We show that this approach can be used to visualize and compare chemogenomics datasets, and possibly to identify cross-interaction complexes in protein-ligand space.


Assuntos
Biologia Computacional/métodos , Descoberta de Drogas/métodos , Genômica/métodos , Proteínas/química , Sítios de Ligação , Bases de Dados de Proteínas , Ligantes
5.
J Chem Inf Model ; 48(11): 2278-88, 2008 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-18937438

RESUMO

Chemogenomics is a new strategy in in silico drug discovery, where the ultimate goal is to understand molecular recognition for all molecules interacting with all proteins in the proteome. To study such cross interactions, methods that can generalize over proteins that vary greatly in sequence, structure, and function are needed. We present a general quantitative approach to protein-ligand binding affinity prediction that spans the entire structural enzyme-ligand space. The model was trained on a data set composed of all available enzymes cocrystallized with druglike ligands, taken from four publicly available interaction databases, for which a crystal structure is available. Each enzyme was characterized by a set of local descriptors of protein structure that describe the binding site of the cocrystallized ligand. The ligands in the training set were described by traditional QSAR descriptors. To evaluate the model, a comprehensive test set consisting of enzyme structures and ligands was manually curated. The test set contained enzyme-ligand complexes for which no crystal structures were available, and thus the binding modes were unknown. The test set enzymes were therefore characterized by matching their entire structures to the local descriptor library constructed from the training set. Both the training and the test set contained enzyme-ligand complexes from all major enzyme classes, and the enzymes spanned a large range of sequences and folds. The experimental binding affinities (p K i) ranged from 0.5 to 11.9 (0.7-11.0 in the test set). The induced model predicted the binding affinities of the external test set enzyme-ligand complexes with an r (2) of 0.53 and an RMSEP of 1.5. This demonstrates that the use of local descriptors makes it possible to create rough predictive models that can generalize over a wide range of protein targets.


Assuntos
Enzimas/química , Modelos Moleculares , Animais , Inteligência Artificial , Análise por Conglomerados , Simulação por Computador , Bases de Dados de Proteínas , Di-Hidro-Orotato Desidrogenase , Descoberta de Drogas , Enzimas/metabolismo , Informática , Cinética , Ligantes , Estrutura Molecular , Oxirredutases atuantes sobre Doadores de Grupo CH-CH/química , Oxirredutases atuantes sobre Doadores de Grupo CH-CH/metabolismo , Oxirredutases atuantes sobre Doadores de Grupo CH-NH/química , Oxirredutases atuantes sobre Doadores de Grupo CH-NH/metabolismo , Plasmodium falciparum/enzimologia , Conformação Proteica , Zea mays/enzimologia , Poliamina Oxidase
6.
Proteins ; 65(3): 568-79, 2006 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-16948162

RESUMO

Modeling and understanding protein-ligand interactions is one of the most important goals in computational drug discovery. To this end, proteochemometrics uses structural and chemical descriptors from several proteins and several ligands to induce interaction-models. Here, we present a new and generalized approach in which proteins varying greatly in terms of sequence and structure are represented by a library of local substructures. Using linear regression and rule-based learning, we combine such local substructures with chemical descriptors from the ligands to model binding affinity for a training set of hydrolase and lyase enzymes. We evaluate the predictive performance of these models using cross validation and sets of unseen ligand with unknown three-dimensional structure. The models are shown to generalize by outperforming models using descriptors from only proteins or only ligands, or models using global structure similarities rather than local similarities. Thus, we demonstrate that this approach is capable of describing dependencies between local structural properties and ligands in otherwise dissimilar protein structures. These dependencies are often, but not always, associated with local substructures that are in contact with the ligands. Finally, we show that strongly bound enzyme-ligand complexes require the presence of particular local substructures, while weakly bound complexes may be described by the absence of certain properties. The results demonstrate that the alignment-independent approach using local substructures is capable of describing protein-ligand interaction for largely different proteins and hence opens up for proteochemometrics-analysis of the interaction-space of entire proteomes. Current approaches are limited to families of closely related proteins. families of closely related proteins.


Assuntos
Biologia Computacional/métodos , Desenho de Fármacos , Inibidores Enzimáticos/química , Enzimas/química , Modelos Moleculares , Proteômica , Algoritmos , Animais , Sítios de Ligação , Bases de Dados de Proteínas , Humanos , Ligantes , Ligação Proteica , Conformação Proteica , Proteínas/química
7.
Proteins ; 63(1): 24-34, 2006 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-16435365

RESUMO

G-Protein-coupled receptors (GPCRs) are among the most important drug targets. Because of a shortage of 3D crystal structures, most of the drug design for GPCRs has been ligand-based. We propose a novel, rough set-based proteochemometric approach to the study of receptor and ligand recognition. The approach is validated on three datasets containing GPCRs. In proteochemometrics, properties of receptors and ligands are used in conjunction and modeled to predict binding affinity. The rough set (RS) rule-based models presented herein consist of minimal decision rules that associate properties of receptors and ligands with high or low binding affinity. The information provided by the rules is then used to develop a mechanistic interpretation of interactions between the ligands and receptors included in the datasets. The first two datasets contained descriptors of melanocortin receptors and peptide ligands. The third set contained descriptors of adrenergic receptors and ligands. All the rule models induced from these datasets have a high predictive quality. An example of a decision rule is "If R1_ligand(Ethyl) and TM helix 2 position 27(Methionine) then Binding(High)." The easily interpretable rule sets are able to identify determinative receptor and ligand parts. For instance, all three models suggest that transmembrane helix 2 is determinative for high and low binding affinity. RS models show that it is possible to use rule-based models to predict ligand-binding affinities. The models may be used to gain a deeper biological understanding of the combinatorial nature of receptor-ligand interactions.


Assuntos
Biologia Computacional/métodos , Proteômica/métodos , Receptores Acoplados a Proteínas G/química , Algoritmos , Animais , Área Sob a Curva , Bases de Dados de Proteínas , Humanos , Concentração de Íons de Hidrogênio , Ligantes , Modelos Biológicos , Modelos Químicos , Modelos Moleculares , Conformação Molecular , Peptídeos/química , Ligação Proteica , Conformação Proteica , Estrutura Terciária de Proteína , alfa-MSH/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...